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q Abstract 

I Sparse coding networks, which utilize unsupervised learning to maximize cod- 

ing efficiency, have successfully reproduced response properties found in primary 
visual cortex [1 ]. However, conventional sparse coding models require that the 
coding circuit can fully sample the sensory data in a one-to-one fashion, a require- 
^ ment not supported by experimental data from the thalamo-cortical projection. To 

{Sj relieve these strict wiring requirements, we propose a sparse coding network con- 

structed by introducing synaptic learning in the framework of compressed sensing. 
CN We demonstrate that the new model evolves biologically realistic spatially smooth 

receptive fields despite the fact that the feedforward connectivity subsamples the 
y£j input and thus the learning has to rely on an impoverished and distorted account 

of the original visual data. Further, we demonstrate that the model could form 
Q\ a general scheme of cortical communication: it can form meaningful representa- 

tions in a secondary sensory area, which receives input from the primary sensory 
area through a "compressing" cortico-cortical projection. Finally, we prove that 
our model belongs to a new class of sparse coding algorithms in which recurrent 
connections are essential in forming the spatial receptive fields. 



1 Introduction 

Guided by the early ideas on efficient sensory coding 13, self-organizing network models for 
sparse coding have been critical in understanding how essential response properties, such as orien- 
tation selectivity, are formed in sensory areas through development and experience dlH. There is 
now a wealth of such models, all based on a set of similar connectivity patterns: a neuron receives 
feedforward drive from the afferent input and competes with other neurons in the network through 
mainly inhibitory lateral connections, see Q] H . Some of these models are capable of re- 
producing the response properties in primary visual cortex quantitatively, for instance, the network 
model proposed in [ 6 ] that implements an algorithm called optimized orthogonal matching pursuit 

® 

While these models match physiological data quite impressively, their correspondence to the 
anatomical connectivity in cortex is problematic. According to the models, the neurons must have 
access to the full data, for instance, to all pixels of an image patch. Many models even suggest that 
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each neuron has the feedforward wiring in place so that the synaptic structure in the feedforward 
path can match the receptive field exactly. It is unclear if the development of the thalamic projec- 
tions into VI can reach such connection density and microscopic precision - even though thalamic 
receptive fields do match the receptive fields of monosynaptically connected VI cells with some 
precision |[T0lfTT1l . Here we explore learning schemes for neural representations that relieve these 
requirements on the feedforward wiring. In addition, we assess the ability of these learning schemes 
to account for learning in cortico-cortical projections, for which it has been established that only a 
fraction of local cells in the origin area send fibers to a target area lfT2l and therefore conventional 
sparse coding on the receiver end could not work. 

To construct sparse coding networks with less restrictive wiring conditions we build on compressed 
sensing or compressed sampling, a method originally developed for data compression by subsam- 
pling. The decompression step in these algorithms has a close similarity to sparse coding models 
and thus these methods can form a framework for developing a new class of neural networks for 
self-organizing neural representations in cortical areas. Specifically, we explore the hypothesis of 
a generic scheme of cortical communication in which each cortical area unwraps subsampled input 
data into a sparse code to perform local computations and then sends a subsampled version of its 
local representation to other cortical areas. 

2 Adaptive compressed sensing 

Conventional sparse coding is governed by the following objective function: 

^(x,a,^) = i||x-^a|| 2 + 5(a). (1) 

Here x G R m is the input data, ^ is a real m x n matrix whose columns form a dictionary for 
constructing the input, and a is a coefficient vector for this reconstruction: x = \l/a. The function 
5(a) is a sparseness constraint that penalizes neural activity and forces the coefficient vector to be 
sparse. 

For a given input x, the sparse coding operation is given by an energy minimization 

a(x) := argmin£:(x,a,^) £ R n . (2) 

a 

The adaptation of the dictionary to the data is performed by minimizing £(x, a(x),^) from Eq.[T] 
and Eq. [2] with respect \l>. Using gradient descent for the adaptation yields a Hebbian synaptic 
learning rule for the \l> components (HE). 

Compressed sensing has been proposed as a technique for data compression using a random projec- 
tion matrix <3> to compress the data x G lR m to 3>x G lR k with k < m. The decompression uses 
energy minimization ^ of an error-bases energy function very similar to Eq. [T] 

E(x,a,*) = i||$x-$^a|| 2 + S(a). (3) 

The original data is reconstructed as x = \l> a(«3>x). In conventional compressed sensing a fixed 
dictionary \I> is chosen. The decompression can be shown to work if (i) a dictionary \I> is chosen in 
which the data can be sparsely represented, (ii) the two matrices <I> and \l> are incoherent and (iii) 
the dimension of the data compression k is larger than the sparsity of the data [ 13lfT4l . 

Building on a previous model 0, we introduce adaptive compressed sensing (ACS), an adaptive 
version of compressed sensing governed by: 

£(x,a,<I>,0) = i||*x-0a|| 2 + A||a|| Lo (4) 

= -x T $ T 0a + ^a T T 0a + A| |a| | Lo + const. 

Learning is executed by gradient descent in in exactly the same fashion as in conventional sparse 
coding, e.g. Q]|6]]. Note, however, the difference between ACS and conventional sparse coding. 
The new algorithm ([4]) forms a dictionary of the compressed data, the k x n matrix 0, whereas 
conventional sparse coding forms a dictionary of the original data, an m x n matrix. Although we 
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use here the Lo-sparseness constraint to penalize the number of active units, 5(a) = A| |a| \lo, similar 
schemes of adaptive compressed sensing can be realized with other types of sparseness constraints. 

Network implementation of ACS: Analogous to earlier models of sparse coding, the coding in ACS 
can be implemented in a network in which earch neuron i computes the gradient of the two differ- 
entiate terms in Eq.Elas 

^ = -(x T $ T 0) i + (0 T 0a) i , (5) 

see (6 ] for further detail. In the neural network for the ACS method the feedforward weights are 
FF := <I> T and the competitive feedback weights are FB := — T 0. Note that if <I> is the 
identity matrix, ACS coincides with conventional sparse coding for which the corresponding neu- 
ral network would be defined by FF = \l> and FB = —FF T FF (HIS). The important difference 
between the two wiring schemes is that the feedforward weights of ACS subsample and mix the orig- 
inal data. Thus, coding and weight adapatation in ACS are lacking the full access to the original data 
that is available to conventional sparse coding. Remarkably, the simulation experiments described in 
the next section demonstrate that the neurons in the ACS network still develop biologically realistic 
receptive fields, despite the limited exposure to the original data. 



3 Simulation experiments with adaptive compressed sensing 

The coding networks described in section [2] were compared in their ability to code patches of natural 
scene images and form receptive fields. The images were preprocessed by "whitening," as described 
in (H. The coding circuits encoded patches of 12 x 12 pixels, making the dimension of the data 
m = 144. For ACS we used a sampling matrix <3> that downsampled the original data to k = 60 
dimensions. All our coding circuits contained n = 432 neurons, thereby producing representations 
of the original data a G W 1 that were three times overcomplete. In addition to image coding in a 
primary sensory area we also tested whether the ACS model could be used by a secondary sensory 
area (2nd stage). Our model of the 2nd stage receives a subsampled version of the sparse code 
generated in the primary visual area 3> 2 a(x) G M k and produces a sparse codes a 2 G M n , again 
with k = 60 and n = 432. All models used a coefficient A = 0.1 in the sparseness constraint of 
Eq.0 

Since the ACS model learns a dictionary of the compressed data rather than the original data, the 
original image cannot be reconstructed from the adapted matrix. Note that computing the data 
dictionary from would require an ill-posed step of matrix factorization: = <I>\I>. Therefore, 
to assess the quality of the emerging codes in the ACS model we measured receptive fields in the 
trained circuit (as physiologists do from the responses of real neurons). We compute the receptive 
fields for a set / of visual stimuli x G M m as 

i?F:=-^x-a(x) T . (6) 

Notice that RF is an m x n real matrix, the i-th column representing the receptive field of the i-th 
neuron. 

The following two figures show the feedforward weights and the receptive fields of the different 
coding circuits. While the feedforward weights and receptive fields in Fig [T] are very similar for 
sparse coding, they are markedly different for ACS in Fig [2] Interestingly, while subsampling makes 
the feedforward weights somewhat amorphous and noisy, the resulting receptive fields of ACS are 
smooth and resemble the receptive fields of sparse coding. When used in a secondary sensory area 
(2nd stage), ACS forms response properties that are similar to those in the primary sensory area, 
though the response properties differ on a neuron-by-neuron basis. 

To assess the degree to which the sparse codes describe the original input, we computed image 
reconstructions. For sparse coding we used the basis functions \l>, for ACS the measured receptive 
fields RF. Fig [3] demonstrates that the ACS forms representations in the primary and secondary 
area that can be used for reconstruction, although the quality of the reconstructions obtained from 
conventional sparse coding is not achieved. 

Fig[4]compares the reconstruction quality of conventional compressed sensing (using the basis func- 
tions that were adapted to the original data) and adaptive compressed sensing. The mean reconstruc- 
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SC *P SC RF 




Figure 1: Feedforward weights and receptive fields of sparse coding circuit (6j. The patterns look 
indistinguishable to the eye and can actually be proven to be the same, see Theorem 1 . 

ACS FF ACS RF ACS (2 nd stage) RF 
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Figure 2: Feedforward weights (ACS FF) and receptive fields (ACS RF) of adaptive compressed 
sensing circuit. The plot 'ACS (2nd stage) RF" depicts receptive fields learned in a cascaded sec- 
ondary sensory area receiving 3> 2 a(x) as the input. 



tion qualities do not differ, though ACS performs with lower variance over the set of input patches 
we tested. 

These simulation results suggest that ACS is able to form representations of sensory data that convey 
its essential structure although the coding network receives only a subsampled version of the data. 
In the following section we try to understand the mathematical differences between conventional 
sparse coding and adaptive compressed sensing. 

4 Differences between ACS and conventional sparse coding models 

In this section we derive two theorems to establish that ACS defines a class of sparse coding algo- 
rithms whose properties differ qualitatively from those of conventional sparse coding models. 

Original Sparse Coding Conventional CS ACS ACS (2 nd stage) 




Figure 3: Original image and reconstruction using the different methods on 12 by 12 image patches. 
For reconstructing the images from the representations formed by ACS (in the first and 2nd stage), 
we used the receptive fields. 
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<SNR> vs. Usage (CS=60, X=0.1) Histogram of CS Usage 
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Figure 4: Left: Signal-to-noise ratio (mean and range of standard deviation) in reconstructions with 
conventional compressed sensing (red) and with ACS (blue). The means do not differ significantly, 
but the performance of ACS has smaller variance. Right: The histograms of the number of used 
neurons per reconstruction are very similar for both methods. 



4.1 Receptive fields and feedforward weights coincide in conventional sparse coding 
networks 

Assuming that x is a column vector of random variables on a measure space ft with probability 
measure /i, the matrix RF will be an approximation to the correlation of x and a(x)Q 



Cor(x,a) = /xa(x)^. 



The strong law of large numbers guarantees that given enough samples, the matrix RF will be 
close to the integral above. Moreover, inequalities such as Hoeffding's [15 ] give analytic estimates 
making more precise this relationship. For these reasons (and expositional clarity), we shall make 
the assumption that RF = Cor(x, a). We are interested in calculating the necessary relationships 
between the quantities RF, FF, FB, and 0. (Recall that FF = $ T and FB = -0 T 0.) 

In our setup, the data x are assumed to come from a sparse number k of independent causes (nonzero 
values in a). Moreover, the method of recovering a(x) from a particular x is assumed to be exact 
(or near exact) in solving Eq.|2]and independently distributed', that is, <I>x = a(x) and we have: 



c/ = /a(x)a(x)>, 



in which I is the n x n identity matrix and c is a positive constant. With these assumptions, one 
calculates: 

QRF = J $x a(x) T d/i = J a(x)a(x) T d/i = c0. (7) 
In particular, this implies the following. 

Theorem 1 If & is the identity, then the receptive fields are a scalar multiple of the feedforward 
weights. 

We remark that one can easily check the scalar multiple condition by way of a quadratic optimization 
problem in one real variable. Namely, given two matrices A and B of the same size (with B ^ 0), 
we have 



trf A^B) 

argmin||A-£B|| = ; , min ||A - tB|| = 



(8) 

Thus, a measure of whether A is a scalar multiple of B is given by the right-most quantity in ((8](. 



tr(A T B) 



^or our purposes, we shall assume that a(x) in fact forms a vector of random variables; that is, it is a vector 
of measurable functions on Q. Moreover, we shall assume that a(x) has zero mean. 
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4.2 In the ACS model receptive fields are co-shaped by feedback 

In the compressive sensing regime, the matrix <3> is no longer the identity but instead a compressive 
sampling matrix. In this case, the receptive fields are almost never scalar multiples of the feedfor- 
ward weights. A precise analytic relationship is given by the following theorem. As an important 
consequence, we shall obtain the qualitative interpretation found in Theorem |4]below. One may skip 
the technical argument without sacrificing much intuition. 



Theorem 2 



mm \\RF -tFF\ \ > — J^,^ n^r^n • min ||0 T — tFF T <& T <&FF\ 
t 11 11 - 2 RF • 2 • t 11 1 



Proof: Set I = arg min^ | \RF - tFF\ | = tr( ^ F| [ 2 F) and let r = min^ | |0 T - tFF T & T &FF\ |. 

Notice that c = ^y|<§yp and c 2 T = RF T $ T $RF from Now, consider the following chain 
of equalities and inequalities: 

c 2 r < ||c 2 T - l 2 FF T <fr T <&FF\ \ 

= ||c 2 T - RF T $ T $RF + RF T $ T $RF - l 2 FF T $ T $FF\\ 

= | \RF T & T &RF - IFF T $ T $RF + IFF T $ T $RF - l 2 FF T $ T $FF\ \ 

< \ \RF T $ T $RF - IFF T $ T $RF\\ + \l\ • \ \FF T $ T $RF - IFF T $ T $FF\\ (9) 

< (\\RF\\^\l\-\\FF\\)-\\^ T ^\\'\\RF-lFF\\ 
<2||i?F|| • ||$ T $|| • \\RF-IFF\\ 

= 2\\RF\\ • ||$ T $|| • mm\\RF — tFF\\. 

Here, we have used that \l\ < jj^jj which is a consequence of the Cauchy-Schwarz inequality. The 
theorem now follows by rearranging terms in ([9]). □ 

What is important here is not the technical statement of Theorem [2] but rather the following qualita- 
tive version of the result. 

Corollary 1 IfSSis not (close to) a scalar multiple ofFF T & T <f>FF, then RF is not (close to) 
a scalar multiple of FF. 

In fact, we can do somewhat better than this. 

Theorem 3 Suppose that 00 T is invertible. If the feedback weights are not a scalar multiple of 
FF T FF, then the receptive fields are not a scalar multiple of the feedforward weights. 

Proof: Suppose that FB is not a scalar multiple of FF T FF. We shall prove that the hypothesis 
on 00 T forces FB =^ tFF T <fr T <f>FF for all t; the conclusion will then follow from Corollary[l] 
Suppose by way of contradiction that FB = tFF T <f> T &FF for some t (which will necessarily be 
negative). Then we have -(00 T ) 2 = £00 T $$ T $$ T 00 T , and thus t(&$ T ) 2 = -I. Since 
positive semidefinite matrices have unique positive semidefinite square roots 1 15 ], it follows that 
<v/— t<&<& T = I. Therefore, we have 

FB = -0 T = -V^0 T $$ T = -V^tFF T FF, 

a contradiction. □ 

Finally, we remark that in the compressive sensing regime k <C n and <fi is a random matrix; thus, 
the two hypotheses of the previous theorem will be satisfied generically: 

Theorem 4 In the adaptive compressed sensing regime, the receptive fields are almost surely not 
scalar multiples of the feedforward weights. 
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5 Discussion and Conclusions 



Here we have proposed adaptive compressed sensing (ACS), a new scheme of learning under com- 
pressed sensing that forms a dictionary adapted to represent the compressed data optimally. The 
coding and learning scheme of ACS can be formulated as a neural network, building on an earlier 
model of sparse coding [6]. Our model employs learning in the weights of the coding circuit and 
keeps the random projection fixed, as opposed to a previous suggestion using learning in the random 
projection to optimize the compression performance fT6l . 

Our study focuses on the application of the proposed learning scheme to understand how corti- 
cal regions in ascending sensory pathways can analyze and represent signals they receive through 
thalamo-cortical or cortico-cortical connections. Conventional sparse coding theories were suc- 
cessful in reproducing physiological responses in primary sensory regions but they require exact 
matches between feedforward connections and receptive field patterns of cortical neurons (see The- 
orem 1 and Fig [T] for an example). Although it has been shown that thalamocortical wiring is to 
some extent specific OH [HI, exact matches between feedforward circuitry and receptive fields are 
not supported by experimental data. In addition, a recent quantitative study of cortico-cortical pro- 
jections suggests that the number of fibers reaching a target area can only be a fraction of the local 
neurons in the area of origin (T2ll . 

We have tested whether or not the ACS network could serve as a computational model for how 
cortical areas can form a representation of data received through afferent projections that sample 
the activity pattern in the previous stage only incompletely and not in a one-to-one fashion. Our 
simulation experiments demonstrate that ACS can form representation of visual data, though, unlike 
in conventional sparse coding models, the coding circuit receives only a subsampled version of 
the original data. Further, we have demonstrated that the algorithm is stackable in a hierarchy. 
The sparse code formed by ACS in a primary sensory area, when sent through another compressing 
projection can be decoded in a secondary sensory area into another meaningful visual representation. 
The shown simulation experiment is not a realistic model for the V1-V2 cascade in the visual stream 
since it omits the internal nonlinear operations in the primary region, such as the transformation 
from simple cell to complex cells, or operations based on pattern similarity (El [HO. The simulation 
results are rather a proof of concept showing how the ACS model can serve as a generic building 
block explaining what information is computed and sent between cortical areas. The scheme consists 
of repeated cycles of compression and expansion into high-dimensional sparse codes. That is, a 
sparse local representations is compressed, sent through cortico-cortical projections and expanded 
to sparse local representations again. A communication scheme using compression expansion cycles 
is reminiscent of Braitenberg's picture of the pump of thought lfl"9l and it can potentially reconcile 
the debate over whether representations in the brain are sparse EOj EH or dense (22, 23 ] since the 
type of code could be lamina- specific. 

In addition, we mathematically characterized the differences between the ACS model and conven- 
tional sparse coding models. Theorem 1 proves that, in conventional sparse coding models, the 
feedforward weights resemble the receptive fields of the neurons very closely. Theorem 4] proves 
that ACS belongs to a class of sparse coding networks in which the receptive fields are dissimilar 
from the feedforward weights. Thus, we conclude that ACS belongs to a new class of competitive 
circuits derived from efficient sparse coding in which the recurrent weights are important for shaping 
the spatial receptive fields when learning on subsampled data. Regarding the still debated role of 
recurrent circuitry in producing orientation selectivity (e.g., (24l[25j[26l), the ACS model suggests 
that if the input subsamples the data then feedback in shaping the receptive fields becomes essential 
for coding efficiency. 
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